Ben Chuanlong Du's Blog

It is never too late to learn.

Spark Issue: _Pickle.Picklingerror: Args[0] from __Newobj__ Args Has the Wrong Class

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Please refer to Spark Issue: Task Not Serializable for a similar serialization issue in Spark/Scala.

Symptom

Cause

For example, if you have the following import

from nltk.corpus import stopwords

then calling the following in UDF or pandas UDFs might cause this issue.

stopwords.words("english")

Solution

Simply move stopwords.words("english") out of UDFs and/or pandas UDFs to define a global variable.

References

关于python:Spark-Submit出现“ Pickling错误”“ _pickle.PicklingError:newobj args中的args [0]具有错误的类”

_pickle.PicklingError: args[0] from newobj args has the wrong class from cloudpickle.py

Comments